Goto

Collaborating Authors

 system response


Automated Evaluation can Distinguish the Good and Bad AI Responses to Patient Questions about Hospitalization

arXiv.org Artificial Intelligence

Automated approaches to answer patient-posed health questions are rising, but selecting among systems requires reliable evaluation. The current gold standard for evaluating the free-text artificial intelligence (AI) responses--human expert review--is labor-intensive and slow, limiting scalability. Automated metrics are promising yet variably aligned with human judgments and often context-dependent. To address the feasibility of automating the evaluation of AI responses to hospitalization-related questions posed by patients, we conducted a large systematic study of evaluation approaches. Across 100 patient cases, we collected responses from 28 AI systems (2800 total) and assessed them along three dimensions: whether a system response (1) answers the question, (2) appropriately uses clinical note evidence, and (3) uses general medical knowledge. Using clinician-authored reference answers to anchor metrics, automated rankings closely matched expert ratings. Our findings suggest that carefully designed automated evaluation can scale comparative assessment of AI systems and support patient-clinician communication.


LLMs-guided adaptive compensator: Bringing Adaptivity to Automatic Control Systems with Large Language Models

arXiv.org Artificial Intelligence

-- With rapid advances in code generation, reasoning, and problem-solving, Large Language Models (LLMs) are increasingly applied in robotics, most existing work focuses on high-level tasks such as task decomposition. A few studies have explored the use of LLMs in feedback controller design, however, these efforts are restricted to overly simplified systems, fixed-structure gain tuning, and lack real-world validation. To further investigate LLMs in automatic control, this work targets a key subfield: adaptive control. Inspired by the framework of model reference adaptive control (MRAC), we propose an LLMs-guided adaptive compensator framework that avoids designing controllers from scratch. Instead, the LLMs are prompted using the discrepancies between an unknown system and a reference system to design a compensator that aligns the response of the unknown system with that of the reference, thereby achieving adaptivity. Experiments evaluate five methods--LLM-guided adaptive compensator, LLM-guided adaptive controller, indirect adaptive control, learning-based adaptive control, and MRAC--on soft and humanoid robots, in both simulated and real-world environments. Results show that the LLMs-guided adaptive com-pensator outperforms traditional adaptive controllers and significantly reduces reasoning complexity compared to the LLMs-guided adaptive controller. The Lyapunov-based analysis and reasoning-path inspection demonstrate that the LLMs-guided adaptive compensator enables a more structured design process by transforming mathematical derivation into a reasoning task, while exhibiting strong generalizability, adaptability, and robustness. This study opens a new direction for applying LLMs in the field of automatic control, offering greater deployability and practicality compared to vision-language models.


PicPersona-TOD : A Dataset for Personalizing Utterance Style in Task-Oriented Dialogue with Image Persona

arXiv.org Artificial Intelligence

Task-Oriented Dialogue (TOD) systems are designed to fulfill user requests through natural language interactions, yet existing systems often produce generic, monotonic responses that lack individuality and fail to adapt to users' personal attributes. To address this, we introduce PicPersona-TOD, a novel dataset that incorporates user images as part of the persona, enabling personalized responses tailored to user-specific factors such as age or emotional context. This is facilitated by first impressions, dialogue policy-guided prompting, and the use of external knowledge to reduce hallucinations. Human evaluations confirm that our dataset enhances user experience, with personalized responses contributing to a more engaging interaction. Additionally, we introduce a new NLG model, Pictor, which not only personalizes responses, but also demonstrates robust performance across unseen domains https://github.com/JihyunLee1/PicPersona.


chebgreen: Learning and Interpolating Continuous Empirical Green's Functions from Data

arXiv.org Artificial Intelligence

They allow us to mathematically express the governing fundamental laws, as rate forms [1]. Researchers have worked on data-driven approaches and deep learning techniques to solve PDEs for various systems [2, 3, 4, 5, 6, 7]. Although they hold significant practical importance, the exact PDEs governing many important phenomena remain unknown across fields, including physics, chemistry, biology, and materials science. Thus, there also has been significant work in discovering [8, 9, 10, 11, 12, 13, 14, 15, 16] the underlying PDEs from observational data. Operator Learning represents another significant research avenue; aiming to approximate the solution operator associated with a hidden partial differential equation [17]. Notable contributions in this field include, but are not limited to, Deep Operator Networks [18], Fourier Neural Operators [19], Graph Neural Operator [20], Multipole Graph Neural Operator [21], and Geometry-informed Neural Operator [22]. A detailed overview of the topic can be found in the guide on operator learning [17].


Improving Multi-Domain Task-Oriented Dialogue System with Offline Reinforcement Learning

arXiv.org Artificial Intelligence

Task-oriented dialogue (TOD) system is designed to accomplish user-defined tasks through dialogues. The TOD system has progressed towards end-to-end modeling by leveraging pre-trained large language models. Fine-tuning the pre-trained language models using only supervised learning leads to the exposure bias and token loss problem and it deviates the models from completing the user's task. To address these issues, we propose a TOD system that leverages a unified pre-trained language model, GPT2, as a base model. It is optimized using supervised learning and reinforcement learning (RL). The issues in the TOD system are mitigated using a non-differentiable reward function. The reward is calculated using the weighted sum of the success rate and BLEU evaluation metrics. The success rate and BLEU metrics in reward calculation guide the language model for user task completion while ensuring a coherent and fluent response. Our model is acquired by fine-tuning a pre-trained model on the dialogue-session level which comprises user utterance, belief state, system act, and system response. Experimental results on MultiWOZ2.1 demonstrate that our model increases the inform rate by 1.60% and the success rate by 3.17% compared to the baseline.


Structured List-Grounded Question Answering

arXiv.org Artificial Intelligence

Document-grounded dialogue systems aim to answer user queries by leveraging external information. Previous studies have mainly focused on handling free-form documents, often overlooking structured data such as lists, which can represent a range of nuanced semantic relations. Motivated by the observation that even advanced language models like GPT-3.5 often miss semantic cues from lists, this paper aims to enhance question answering (QA) systems for better interpretation and use of structured lists. To this end, we introduce the LIST2QA dataset, a novel benchmark to evaluate the ability of QA systems to respond effectively using list information. This dataset is created from unlabeled customer service documents using language models and model-based filtering processes to enhance data quality, and can be used to fine-tune and evaluate QA models. Apart from directly generating responses through fine-tuned models, we further explore the explicit use of Intermediate Steps for Lists (ISL), aligning list items with user backgrounds to better reflect how humans interpret list items before generating responses. Our experimental results demonstrate that models trained on LIST2QA with our ISL approach outperform baselines across various metrics. Specifically, our fine-tuned Flan-T5-XL model shows increases of 3.1% in ROUGE-L, 4.6% in correctness, 4.5% in faithfulness, and 20.6% in completeness compared to models without applying filtering and the proposed ISL method.


A method for identifying causality in the response of nonlinear dynamical systems

arXiv.org Artificial Intelligence

Predicting the response of nonlinear dynamical systems subject to random, broadband excitation is important across a range of scientific disciplines, such as structural dynamics and neuroscience. Building data-driven models requires experimental measurements of the system input and output, but it can be difficult to determine whether inaccuracies in the model stem from modelling errors or noise. This paper presents a novel method to identify the causal component of the input-output data from measurements of a system in the presence of output noise, as a function of frequency, without needing a high fidelity model. An output prediction, calculated using an available model, is optimally combined with noisy measurements of the output to predict the input to the system. The parameters of the algorithm balance the two output signals and are utilised to calculate a nonlinear coherence metric as a measure of causality. This method is applicable to a broad class of nonlinear dynamical systems. There are currently no solutions to this problem in the absence of a complete benchmark model.


Damage detection in an uncertain nonlinear beam based on stochastic Volterra series: an experimental application

arXiv.org Artificial Intelligence

The damage detection problem becomes a more difficult task when the intrinsically nonlinear behavior of the structures and the natural data variation are considered in the analysis because both phenomena can be confused with damage if linear and deterministic approaches are implemented. Therefore, this work aims the experimental application of a stochastic version of the Volterra series combined with a novelty detection approach to detect damage in an initially nonlinear system taking into account the measured data variation, caused by the presence of uncertainties. The experimental setup is composed by a cantilever beam operating in a nonlinear regime of motion, even in the healthy condition, induced by the presence of a magnet near to the free extremity. The damage associated with mass changes in a bolted connection (nuts loosed) is detected based on the comparison between linear and nonlinear contributions of the stochastic Volterra kernels in the total response, estimated in the reference and damaged conditions. The experimental measurements were performed on different days to add natural variation to the data measured. The results obtained through the stochastic proposed approach are compared with those obtained by the deterministic version of the Volterra series, showing the advantage of the stochastic model use when we consider the experimental data variation with the capability to detect the presence of the damage with statistical confidence. Besides, the nonlinear metric used presented a higher sensitivity to the occurrence of the damage compared with the linear one, justifying the application of a nonlinear metric when the system exhibits intrinsically nonlinear behavior.


Damage detection in an uncertain nonlinear beam based on stochastic Volterra series

arXiv.org Artificial Intelligence

The damage detection problem in mechanical systems, using vibration measurements, is commonly called Structural Health Monitoring (SHM). Many tools are able to detect damages by changes in the vibration pattern, mainly, when damages induce nonlinear behavior. However, a more difficult problem is to detect structural variation associated with damage, when the mechanical system has nonlinear behavior even in the reference condition. In these cases, more sophisticated methods are required to detect if the changes in the response are based on some structural variation or changes in the vibration regime, because both can generate nonlinearities. Among the many ways to solve this problem, the use of the Volterra series has several favorable points, because they are a generalization of the linear convolution, allowing the separation of linear and nonlinear contributions by input filtering through the Volterra kernels. On the other hand, the presence of uncertainties in mechanical systems, due to noise, geometric imperfections, manufacturing irregularities, environmental conditions, and others, can also change the responses, becoming more difficult the damage detection procedure. An approach based on a stochastic version of Volterra series is proposed to be used in the detection of a breathing crack in a beam vibrating in a nonlinear regime of motion, even in reference condition (without crack). The system uncertainties are simulated by the variation imposed in the linear stiffness and damping coefficient. The results show, that the nonlinear analysis done, considering the high order Volterra kernels, allows the approach to detect the crack with a small propagation and probability confidence, even in the presence of uncertainties.


Towards Leveraging Large Language Models for Automated Medical Q&A Evaluation

arXiv.org Artificial Intelligence

This paper explores the potential of using Large Language Models (LLMs) to automate the evaluation of responses in medical Question and Answer (Q\&A) systems, a crucial form of Natural Language Processing. Traditionally, human evaluation has been indispensable for assessing the quality of these responses. However, manual evaluation by medical professionals is time-consuming and costly. Our study examines whether LLMs can reliably replicate human evaluations by using questions derived from patient data, thereby saving valuable time for medical experts. While the findings suggest promising results, further research is needed to address more specific or complex questions that were beyond the scope of this initial investigation.